-
Notifications
You must be signed in to change notification settings - Fork 130
Prefer chat_template.jinja
#205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3de174c
to
81431bc
Compare
I tested this and verified with debug print statements that it works. If someone can provide an example of a repo that has a chat template stored in While I was testing this in the LLMEval app in mlx-swift-examples, I got some build errors in |
What kind of errors did you see? This function provides some overrides for the swift-transformers tokenizer configuration. e.g. for models that are not directly supported in struct TokenizerModel {
static let knownTokenizers: [String: PreTrainedTokenizerModel.Type] = [
"BertTokenizer": BertTokenizer.self,
"DistilbertTokenizer": BertTokenizer.self,
"DistilBertTokenizer": BertTokenizer.self,
"RobertaTokenizer": BPETokenizer.self,
"CodeGenTokenizer": CodeGenTokenizer.self,
"CodeLlamaTokenizer": CodeLlamaTokenizer.self,
"FalconTokenizer": FalconTokenizer.self,
"GemmaTokenizer": GemmaTokenizer.self,
"GPT2Tokenizer": GPT2Tokenizer.self,
"LlamaTokenizer": LlamaTokenizer.self,
"T5Tokenizer": T5Tokenizer.self,
"WhisperTokenizer": WhisperTokenizer.self,
"CohereTokenizer": CohereTokenizer.self,
"Qwen2Tokenizer": Qwen2Tokenizer.self,
"PreTrainedTokenizer": BPETokenizer.self,
]
Anyway, this method will substitute names: private var replacementTokenizers = [
"`InternLM2Tokenizer`": "PreTrainedTokenizer",
"Qwen2Tokenizer": "PreTrainedTokenizer",
"Qwen3Tokenizer": "PreTrainedTokenizer",
"CohereTokenizer": "PreTrainedTokenizer",
]
Anyway, let me know what the error looks like and I can help figure it out! |
These are the errors that I'm getting: private func updateTokenizerConfig(_ tokenizerConfig: Config) -> Config {
// workaround: replacement tokenizers for unhandled values in swift-transform
if let tokenizerClass = tokenizerConfig.tokenizerClass?.stringValue,
let replacement = replacementTokenizers[tokenizerClass] // Error: Cannot convert value of type 'Config' to expected argument type 'String'
{
var dictionary = tokenizerConfig.dictionary // Error: Ambiguous use of 'dictionary'
dictionary["tokenizer_class"] = replacement
return Config(dictionary)
}
return tokenizerConfig
} You can test it with this branch: https://github.com/DePasqualeOrg/mlx-swift-examples/tree/test-chat-template-jinja |
OK, it looks like the API for I think this is the equivalent -- just need to call the new methods: private func updateTokenizerConfig(_ tokenizerConfig: Config) -> Config {
// workaround: replacement tokenizers for unhandled values in swift-transform
if let tokenizerClass = tokenizerConfig.tokenizerClass?.string(),
let replacement = replacementTokenizers[tokenizerClass]
{
if var dictionary = tokenizerConfig.dictionary() {
dictionary["tokenizer_class"] = .init(replacement)
return Config(dictionary)
}
}
return tokenizerConfig
} |
Great, thank you! I created this PR, which can be merged after swift-transformers creates a new version tag and that is picked up in mlx-swift-examples. |
@DePasqualeOrg created a repo with only a |
In the repo that you created, maybe you can use a different chat template in |
That's a good idea, I agree. |
Good thinking @DePasqualeOrg, switched it out here: https://huggingface.co/FL33TW00D-HF/jinja-test/blob/main/tokenizer_config.json#L232 |
Happy to merge if you guys are @pcuenca @DePasqualeOrg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks a lot @DePasqualeOrg!
Regarding the breaking change in a previous PR, let me try to suggest some synonyms in a new PR, coming up shortly.
This still needs to be tested.